Exploring Speculative Parallelism in Spec2006 Exploring Speculative Parallelism in Spec2006 Exploring Speculative Parallelism in Spec2006

نویسندگان

  • Venkatesan Packirisamy
  • Antonia Zhai
چکیده

Computer industry has adopted multi-threaded and multi-core architectures as the clock rate increase stalled in early 2000’s. It was hoped that the continuous improvement of single-program performance could be achieved through these architectures. However, traditional parallelizing compilers often fail to effectively parallelize general-purpose applications which typically have complex control flow and excessive pointer usage. Recently hardware techniques like Transactional Memory (TM) and Thread-Level Speculation (TLS) have been proposed to simplify the task of parallelization by using speculative threads. Potential of speculative parallelism in general-purpose applications like SPEC CPU 2000 have been well studied and have shown to be moderately successful. Preliminary work that examined the potential parallelism in SPEC2006 deployed parallel threads with a restrictive TLS execution model and limited compiler support, and thus showed only limited performance potential. In this paper, we first analyze the cross-iteration dependence behavior of SPEC 2006 benchmarks and show that more parallelism potential is available in SPEC 2006 benchmarks, comparing against SPEC2000. Further, we use a state-of-the-art profile-driven TLS compiler to identify loops that can be speculatively parallelized. Overall, we found an average speedup of 60% on four cores over what could be achieved by a traditional parallelizing compiler such as Intel’s ICC compiler on such benchmarks. We also found that an additional 11% improvement could be obtained on selected benchmarks using 8 cores when we extend TLS on multiple loop levels as opposed to restricting TLS only on a single loop level.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Dynamic Parallel Media Processing using Speculative Broadcast Loop ( SBL ) ( Extended

This paper presents the results of a study of dynamic parallel media processing using Speculative Broadcast Loop (SBL), a speculative run-time loop-level parallelization method. Due to processing regularity, multimedia applications typically contain extensive parallelism. Data parallelism between independent loop iterations may be supported by subword parallelism methods, but much of the data p...

متن کامل

Exploring the Capacity of a Modern SMT Architecture to Deliver High Scientific Application Performance

Simultaneous multithreading (SMT) has been proposed to improve system throughput by overlapping instructions from multiple threads on a single wide-issue processor. Recent studies have demonstrated that heterogeneity of simultaneously executed applications can bring up significant performance gains due to SMT. However, the speedup of a single application that is parallelized into multiple threa...

متن کامل

Storage Efficient Hardware Prefetching using Delta-Correlating Prediction Tables

This paper presents a novel prefetching heuristic called Delta Correlating Prediction Tables (DCPT). DCPT builds upon two previously proposed techniques, RPT prefetching by Chen and Baer and PC/DC prefetching by Nesbit and Smith. It combines the storageefficient table based design of Reference Prediction Tables (RPT) with the high performance delta correlating design of PC/DC. DCPT substantiall...

متن کامل

Hints based Speculative Execution for Exploiting Probabilistic Parallelism

Performance growth of processors will depend on the amount of parallelism exposed by software; at the same time, many algorithms are inherently sequential: they have dependencies that prevent parallelism from being automatically exploited. In this paper we describe a new type of parallelism called probabilistic parallelism that can be unlocked in inherently sequential programs using speculative...

متن کامل

The Effect of Speculative Execution on Cache Performance

Superscalar microprocessors obtain high performance by exploiting parallelism at the instruction level. To effectively use the instruction-level parallelism found in general purpose, non-numeric code, future processors will need to speculatively execute far beyond instruction fetch limiting conditional branches. One result of this deep speculation is an increase in the number of instruction and...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008